118 research outputs found

    A Comparison of some recent Task-based Parallel Programming Models

    Get PDF
    The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0. The choice of model and implementation can have a major impact on the final performance and in order to understand some of the trade-offs we have made a quantitative study comparing four implementations of OpenMP (gcc, Intel icc, Sun studio and the research compiler Mercurium/nanos mcc), Cilk++ and Wool, a high-performance task-based library developed at SICS. Abstract. We use microbenchmarks to characterize costs for task-creation and stealing and the Barcelona OpenMP Tasks Suite for characterizing application performance. By far Wool and Cilk++ have the lowest overhead in both spawning and stealing tasks. This is reflected in application performance when many tasks with small granularity are spawned where Cilk++ and, in particular, has the highest performance. For coarse granularity applications, the OpenMP implementations have quite similar performance as the more light-weight Cilk++ and Wool except for one application where mcc is superior thanks to a superior task scheduler. Abstract. The OpenMP implemenations are generally not yet ready for use when the task granularity becomes very small. There is no inherent reason for this, so we expect future implementations of OpenMP to focus on this issue

    Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

    Full text link
    In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.Comment: Accepted to The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015

    Adaptive and Flexible Dictionary Code Compression for Embedded Applications

    Get PDF
    ABSTRACT Dictionary code compression is a technique where long instructions in the memory are replaced with shorter code words used as index in a table to look up the original instructions. We present a new view of dictionary code compression for moderately high-performance processors for embedded applications. Previous work with dictionary code compression has shown decent performance and energy savings results which we verify with our own measurement that are more thorough than previously published. We also augment previous work with a more thorough analysis on the effects of cache and line size changes. In addition, we introduce the concept of aggregated profiling to allow for two or more programs to share the same dictionary contents. Finally, we also introduce dynamic dictionaries where the dictionary contents is considered to be part of the context of a process and show that the performance overhead of reloading the dictionary contents on a context switch is negligible while on the same time we can save considerable energy with a more specialized dictionary contents

    Менталитет восточных славян и социокультурные аспекты интеграционных процессов в Белорусско-Российско-Украинском приграничьи

    Get PDF
    В статті представлені результати дослідження менталъних особливостей росіян, украінців та білорусів. Зафіксована висока ступінь близькоесті самооцінок та взаемних оцінок i'xніx характерологічних рис. Робиться висновок про те, що спорідненість цих народів створюе сприятливі передумови для розвитку інтеграційних процесів в російсько-білорусько-украінсъкому прикордонт

    WebAssembly beyond the Web: A Review for the Edge-Cloud Continuum

    Get PDF
    peer reviewedThe cloud computing environment has changed over the past years, transitioning from a centralized architecture including big data centers to a dispersed and heterogeneous architecture that incorporates edge followed by device and processing units. This transformation calls for a cross-platform, interoperable solution, a feature that WebAssembly (Wasm) offers. Wasm can be used as a compact and effective representation of server-less functions or micro-services deployment at the cloud edge. In heterogeneous edge settings, where various hardware and software systems might be employed, this is especially crucial. Developers can create applications that can operate on any Wasm-compatible device without spending time worrying about platform-specific challenges by using a common runtime environment.In this survey, we indicate the main challenges and opportunities for Wasm runtimes in the edge-cloud continuum, such as performance optimisation, security, and interoperability with other programming languages and platforms. We provide a comprehensive overview of the current landscape of Wasm outside the web, including possible standardization efforts and best practices for using these runtimes, thus serving as a valuable resource for researchers and practitioners in the field.U-AGR-7134 - BRIDGES2021/IS/16327771 ACE5G Proximus (01/02/2022 - 31/01/2025) - BRORSSON Mats Haka

    Cost-Effective Scheduling for Kubernetes in the Edge-to-Cloud Continuum

    Get PDF
    peer reviewedThe edge to data center computing continuum is the aggregation of computing resources located anywhere between the network edge (e.g. close to 5G antennas), and servers in traditional data centers. Kubernetes is the de facto standard for container orchestration. It is very efficient in a data center environment, but it fails to give the same performance when adding edge resources. At the edge, resources are more limited, and networking conditions are changing over time. In this paper, we present a methodology that lowers the costs of running applications in the edge-to-cloud computing continuum. A cost-aware scheduler enables this optimization. We are also monitoring the Key Performance Indicators of the applications to ensure that cost optimizations do not impact negatively their Quality of Service. In addition, to ensure that performances are optimal even when users are moving, we introduce a background process that periodically checks if a better location is available for the application. To demonstrate the performance of our scheduling approach, we evaluate it on a vehicle cooperative perception use case, a representative 5G application

    Performance Analysis and Benchmarking of a Temperature Downscaling Deep Learning Model

    Get PDF
    We are presenting here a detailed analysis and performance characterization of a statistical temperature downscaling application used in the MAELSTROM EuroHPC project. This application uses a deep learning methodology to convert low-resolution atmospheric temperature states into high-resolution. We have performed in-depth profiling and roofline analysis at different levels (Operators, Training, Distributed Training, Inference) of the downscaling model on different hardware architectures (Nvidia V100 & A100 GPUs). Finally, we compare the training and inference cost of the downscaling model with various cloud providers. Our results identify the model bottlenecks which can be used to enhance the model architecture and determine hardware configuration for efficiently utilizing the HPC. Furthermore, we provide a comprehensive methodology for in-depth profiling and benchmarking of the deep learning models

    Формы и системы оплаты труда персонала предприятия как факторы повышения его производительности (на примере ОАО «Гомельский завод литья и нормалей»)

    Get PDF
    Transactional memory (TM) is emerging as an attractive synchronization mechanism for concurrent computing. In this work we aim at filling a relevant gap in the TM literature, by investigating the issue of energy efficiency for one crucial building block of TM systems: contention management. Green-CM, the solution proposed in this paper, is the first contention management scheme explicitly designed to jointly optimize both performance and energy consumption. To this end Green-TM combines three key mechanisms: i) it leverages on a novel asymmetric design, which combines different backoff policies in order to take advantage of dynamic frequency and voltage scaling; ii) it introduces an energy efficient design of the back-off mechanism, which combines spin-based and sleep-based implementations; iii) it makes extensive use of selftuning mechanisms to pursue optimal efficiency across highly heterogeneous workloads. We evaluate Green-CM from both the energy and performance perspectives, and show that it can achieve enhanced efficiency by up to 2.35 times with respect to state of the art contention managers, with an average gain of more than 60% when using 64 threads.QC 20160421</p

    Student-centred GP ambassadors: Perceptions of experienced clinical tutors in general practice undergraduate training

    Get PDF
    Objective. To explore experienced general practitioner (GP) tutor perceptions of a skilled GP tutor of medical students. Design. Interview study based on focus groups. Setting. Twenty GPs experienced in tutoring medical students at primary health care centres in two Swedish regions were interviewed. Method. Four focus-group interviews were analysed using qualitative content analysis. Subjects. Twenty GP tutors, median age 50, specifically selected according to age, gender, and location participated in two focus groups in Gothenburg and Malmo, respectively. Main outcome measures. Meaning units in the texts were extracted, coded and condensed into categories and themes. Results. Three main themes emerged: "Professional as GP and ambassador to general practice", "Committed and student-centred educator", and "Coordinator of the learning environment". Conclusion. Experienced GP tutors describe their skills as a clinical tutor as complex and diversified. A strong professional identity within general practice is vital and GP tutors describe themselves as ambassadors to general practice, essential to the process of recruiting a new generation of general practitioners. Leaders of clinical education and health care planners must understand the complexity in a clinical tutor's assignment and provide adequate support, time, and resources in order to facilitate a sustainable tutorship and a good learning environment, which could also improve the necessary recruitment of future GPs
    corecore